1. <
text, operating system> (regexp, RE) One of the {wild
card} patterns used by
Perl and other languages, following
Unix utilities such as
grep,
sed, and
awk and editors
such as
vi and
Emacs.
Regular expressions use conventions
similar to but more elaborate than those described under
glob. A
regular expression is a sequence of characters with
the following meanings:
An ordinary character (not one of the special characters
discussed below) matches that character.
A backslash () followed by any special character matches the
special character itself. The special characters are:
"." matches any character except NEWLINE; "RE*" (where
the "*" is called the "
Kleene star") matches zero
or more occurrences of RE. If there is any choice, the
longest leftmost matching string is chosen, in most
regexp
flavours.
"^" at the beginning of an RE matches the start of a line and
"$" at the end of an RE matches the end of a line.
[
string] matches any one character in that string. If the
first character of the string is a "^" it matches any
character except the remaining characters in the string (and
also usually excluding NEWLINE). "-" may be used to indicate
a range of consecutive ASCII characters.
( RE ) matches whatever RE matches and
, where n is a
digit, matches whatever was matched by the RE between the nth
( and its corresponding ) earlier in the same RE. Many
flavours use ( RE ) used instead of ( RE ).
The concatenation of REs is a RE that matches the
concatenation of the strings matched by each RE. RE1 | RE2
matches whatever RE1 or RE2 matches.
< matches the beginning of a word and > matches the end of a
word. In many flavours of regexp, > and < are replaced by
"", the special character for "word boundary".
RE
m matches m occurences of RE. RE
m, matches m or
more occurences of RE. RE
m,n matches between m and n
occurences.
The exact details of how regexp will work in a given
application vary greatly from flavour to flavour. A
comprehensive survey of regexp flavours is found in Friedl
1997 (see below).
[
Jeffrey E.F. Friedl, "{regular expressionjfriedl/regex/index.html">Mastering Regular Expressions
(http://enterprise.ic.gc.ca/regular expressionjfriedl/regex/index.html)},
O'Reilly, 1997].
2. Any description of a
pattern composed from combinations
of
symbols and the three
operators:
Concatenation - pattern A concatenated with B matches a match
for A followed by a match for B.
Or - pattern A-or-B matches either a match for A or a match
for B.
Closure - zero or more matches for a pattern.
The earliest form of
regular expressions (and the term itself)
were invented by mathematician
Stephen Cole Kleene in the
mid-1950s, as a notation to easily manipulate "
regular sets",
formal descriptions of the behaviour of {finite state
machines}, in
regular algebra.
[
S.C. Kleene, "Representation of events in nerve nets and
finite automata", 1956, Automata Studies. Princeton].
[
J.H. Conway, "Regular algebra and finite machines", 1971, Eds
Chapman & Hall].
[
Sedgewick, "Algorithms in C", page 294].
(2004-02-01)